Improving a Catalan-Spanish Statistical Translation System using Morphosyntactic Knowledge
نویسندگان
چکیده
In this paper, a human evaluation of a Catalan-Spanish Ngram-based statistical machine translation system is used to develop specific techniques based on the use of grammatical categories, lexical categorisation and text processing, for the enhancement of the final translation. The system is successfully improved when testing with ad hoc and general corpora, as it is shown in the final automatic evaluation.
منابع مشابه
Spanish-Catalan Translator Using Statistical Methods
The development of a Spanish-Catalan statistical machine translation system is described in this paper. The methodology used attempts to solve the problem using a purely inductive approach, without using linguistic knowledge. To obtain the translator, we perform the following steps: First, we obtain a bilingual corpus from Internet. Second, we fragment the corpus into units (sentences and token...
متن کاملAutomatic and Human Evaluation Study of a Rule-based and a Statistical Catalan-Spanish Machine Translation Systems
Machine translation systems can be classified into rule-based and corpus-based approaches, in terms of their core technology. Since both paradigms have largely been used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of a rule-based and a corpus-based (pa...
متن کاملCatalan-English Statistical Machine Translation without Parallel Corpus: Bridging through Spanish
This paper presents a full experiment on large-vocabulary Catalan-English statistical machine translation without an English-Catalan parallel corpus, in the context of the debates of the European Parliament. For this, we make use of an English-Spanish European Parliament Proceedings parallel corpus and a Spanish-Catalan general newspaper parallel corpus, both of which of more than 30 M words. G...
متن کاملCatalan-English statistical machine translation without a parallel corpus
This paper presents a full experiment on large-vocabulary Catalan-English statistical machine translation without an English-Catalan parallel corpus, in the context of the debates of the European Parliament. For this, we make use of an English-Spanish European Parliament Proceedings parallel corpus and a Spanish-Catalan general newspaper parallel corpus, both of which of more than 30 M words. G...
متن کاملA Large Spanish-Catalan Parallel Corpus Release for Machine Translation
We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catala...
متن کامل